Newspaper Document Analysis Featuring Connected Line Segmentation

نویسندگان

  • Phillip E. Mitchell
  • Hong Yan
چکیده

This paper presents an algorithm designed to segment and classify newspaper documents. A notable feature of this algorithm is the ability to detect lines in the document – including lines that are connected to other components. A bottom-up approach is used to segment the image into patterns, and then each pattern is classified into one of seven types. Complete regions are then formed from the classified patterns.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persian Printed Document Analysis and Page Segmentation

This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifyi...

متن کامل

Arabic Newspaper Page Segmentation

The aim of layout analysis is to extract the geometric structure from a document image. It consists of labeling homogenous regions of a document image. This paper describes the performance of segmentation algorithms and their adaptation in order to treat complex structured Arabic documents such as newspapers. Experimental tests have been carried out on four different phases of newspaper image a...

متن کامل

Page segmentation and text extraction from gray-scale images in microfilm format

The paper deals with a suitably designed system that is being used to separate textual regions from graphics regions and locate textual data from textured background. We presented a method based on edge detection to automatically locate text in some noise infected grayscale newspaper images with microfilm format. The algorithm first finds the appropriate edges of textual region using Canny edge...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Connected Component Based Word Spotting on Persian Handwritten image documents

Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001